Search AI Products and News

Explore worldwide AI information, discover new AI opportunities

✓AI News
AI Tools

Type :

✓AI News
AI Tools

2025-04-09 08:38:16.AIbase

Groundbreaking Advancements in AI Avatars: Talking Digital Twins Reshaping the Future of Human-Computer Interaction

Recent breakthroughs in generative AI have enabled AI avatars to not only possess lifelike appearances but also speak naturally and fluently. This technology, incorporating cutting-edge speech synthesis and facial expression generation capabilities, is rapidly blurring the lines between the digital and physical worlds, propelling AI from a behind-the-scenes tool to a direct conversational partner with humans. The emergence of these AI avatars marks a crucial step in the convergence of generative AI technologies. By seamlessly integrating highly realistic facial animation with natural speech synthesis, these avatars offer unprecedented potential for revolutionizing communication and interaction.

2025-04-03 08:23:40.AIbase

ByteDance Releases MegaTTS3 on Hugging Face: A Breakthrough in Lightweight Speech Synthesis

Beijing—ByteDance recently released its latest text-to-speech (TTS) model, MegaTTS3, on the Hugging Face open-source AI community. This release has quickly garnered attention from AI researchers and developers worldwide due to its breakthroughs in lightweight design and multilingual support. Based on community feedback and official information, MegaTTS3 is hailed as a significant advancement in speech synthesis. MegaTTS3's core highlights are...

2025-03-14 10:53:41.AIbase

Sesame Releases CSM Model: Real-time Emotion-Customized AI Speech Synthesis Reaches New Heights

On March 13th, Sesame unveiled its latest speech synthesis model, CSM, attracting significant industry attention. According to the official introduction, CSM adopts an end-to-end Transformer-based multimodal learning architecture. It understands contextual information to generate natural and emotionally rich speech with stunningly realistic sound. The model supports real-time speech generation, processing both text and audio inputs. Users can also control features such as tone, intonation, rhythm, and emotion by adjusting parameters, showcasing high flexibility. CSM is considered a breakthrough in AI speech technology.

2025-03-06 11:29:22.AIbase

Spark-TTS: A Text-to-Speech System Supporting Zero-Shot Voice Cloning and Fine-grained Control

2025-03-03 11:37:51.AIbase

Sesame Releases CSM Voice Model: Transcending the Uncanny Valley with Globally Stunning Realism

Sesame's newly released Conversational Speech Model (CSM) has recently sparked heated discussions on X, lauded as a voice model that sounds "just like a real person." Its stunning naturalness and emotional expressiveness not only make it indistinguishable from human speech for users, but also claim to have successfully overcome the uncanny valley effect in the field of voice technology. With the spread of demonstration videos and user feedback, CSM is rapidly becoming a leader in AI voice technology.

2024-11-22 15:28:38.AIbase

Meta's Latest Audio Model SPIRIT LM: Making AI Not Just Talk, But Also Express Emotion!

Recently, Meta AI open-sourced a foundational multimodal language model named SPIRIT LM, which can freely mix text and speech, opening new possibilities for multimodal tasks involving audio and text. SPIRIT LM is based on a pre-trained text language model with 7 billion parameters, which has been continuously trained on text and speech units, expanding into the speech modality. It can understand and generate text like a large text model, while also being capable of understanding and generating speech, and even mixing text and speech to create various forms of expression.

2024-11-06 11:24:56.AIbase

OuteTTS-0.1-350M: A Novel Text-to-Speech Synthesis Method with Zero-Shot Voice Cloning Capability

Recently, Oute AI released a novel text-to-speech synthesis method called OuteTTS-0.1-350M. This method utilizes pure language modeling without the need for external adapters or complex architectures, offering a simplified TTS approach. OuteTTS-0.1-350M is based on the LLaMa architecture, using WavTokenizer to directly generate audio tokens, making the process more efficient. The model features zero-shot voice cloning capability, requiring only a few seconds of reference audio.

2024-09-25 14:22:11.AIbase

Google's New Voice Cloning Technology: Voice Cloning with Just a Few Seconds of Audio Sample

In today's rapidly advancing technology, speech synthesis technology is also progressing, especially in the field of restoring lost voices. Recently, Google researchers introduced a new technology called 'Zero-shot Voice Transfer' which can be directly integrated with state-of-the-art Text-to-Speech (TTS) systems to help those who have lost their voices due to illness or accidents regain their 'voice memory'. The core of this technology is its 'zero-shot' capability, meaning that we do not need a large number of samples to achieve this.

2024-09-24 15:40:26.AIbase

ByteDance Volcano Engine Launches Doubao Music Model and Simultaneous Interpretation Model

At today's 2024 Volcano Engine AI Innovation Tour, in addition to the video generation model, ByteDance also launched the Doubao Music Model and Doubao Simultaneous Interpretation Model, announcing significant upgrades to the Doubao General Model Pro, Text-to-Image Model, Speech Synthesis Model, and other specialized models. The introduction of the Doubao Music Model signifies Volcano Engine's deep commitment to the field of music creation. Supported by powerful algorithms, this model enables high-quality music creation freely. For lyrics generation, it can quickly generate emotional lyrics based on just a few simple input words.

2024-09-13 11:13:24.AIbase

Fish Speech 1.4 Released: Open Source TTS Model Achieves Multilingual Breakthrough

The release of Fish Speech 1.4 marks a significant breakthrough in multilingual support and performance for this open-source text-to-speech (TTS) model. As an innovative solution committed to providing high-quality and natural-sounding speech synthesis experiences, Fish Speech demonstrates its strong technical capabilities and broad application prospects in this update. The most notable feature of Fish Speech 1.4 is its powerful multilingual support capability: the training data volume has doubled, enhancing the model's

2024-08-02 09:51:19.AIbase

The Future Is Here! Alibaba's New Voice Technology CosyVoice Makes AI Speak More Naturally

Alibaba's latest CosyVoice speech synthesis model and SenseVoice speech recognition model together form the FunAudioLLM framework, aimed at enhancing human-computer interaction experience. CosyVoice, with its realistic voice generation capability, can mimic voices of different genders, ages, and personalities, adding emotions and styles while even simulating natural features such as laughter, coughing, and breathing. SenseVoice focuses on high-precision multilingual speech recognition, emotion detection, and audio event detection, supporting over 50 languages.

2024-07-26 08:38:44.AIbase

ByteDance Releases Doubao Image Generation Model: Doubao Large Model's Daily Token Usage Exceeds 50 Billion

2024-01-22 16:20:15.AIbase

AI Voice Company ElevenLabs Completes $80 Million Series B Funding

AI speech synthesis startup ElevenLabs has completed $80 million in Series B funding led by Andreessen Horowitz, Nat Friedman, and others. The funding will be used for product development, expanding infrastructure and teams, AI research, and enhancing security measures. ElevenLabs is valued at over $1 billion.

2023-11-14 14:05:43.AIbase

Bilibili UP主 'XiaoChongGe_' Clones Genshin Impact Character Voices Using VITS Speech Synthesis Model

Bilibili UP主 'XiaoChongGe_' released a fan-made video of Genshin Impact, featuring character imitation of Frowning Na, retaining the original voice, with over 1.36 million views. The creator used the VITS speech synthesis model to upload voice text and extract language features, preserving the unique vocal characteristics of the Genshin Impact character. AI voice cloning technology has become widespread, with tools like HeyGen and AI Dubbing lowering the barriers to entry. The upgrade of AI technology has attracted the attention of creators, emphasizing the importance of producing unique works in both content and form. The article discusses the principles of the VITS model and briefly explains speech synthesis.

2023-11-10 14:01:01.AIbase

NetEase Youdao Launches Open Source Speech Synthesis Engine 'Yimosheng' Supporting Over 2000 Voice Tones

NetEase Youdao has launched the 'Yimosheng' open source speech synthesis engine, supporting both Chinese and English languages with over 2000 different voice tones. This engine features a distinctive emotional synthesis capability, able to synthesize voices that convey a wide range of emotions including happiness, excitement, sadness, and anger. Users can download and use it for free on GitHub and implement emotional synthesis and applications through the provided interfaces and script APIs. This project aims to help developers and content creators expand the application scope of high-quality TTS. NetEase Youdao has also introduced voice customization and voice replication.

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

Search AI Products and News

Explore worldwide AI information, discover new AI opportunities

Groundbreaking Advancements in AI Avatars: Talking Digital Twins Reshaping the Future of Human-Computer Interaction

ByteDance Releases MegaTTS3 on Hugging Face: A Breakthrough in Lightweight Speech Synthesis

Sesame Releases CSM Model: Real-time Emotion-Customized AI Speech Synthesis Reaches New Heights

Spark-TTS: A Text-to-Speech System Supporting Zero-Shot Voice Cloning and Fine-grained Control

Sesame Releases CSM Voice Model: Transcending the Uncanny Valley with Globally Stunning Realism

Meta's Latest Audio Model SPIRIT LM: Making AI Not Just Talk, But Also Express Emotion!

OuteTTS-0.1-350M: A Novel Text-to-Speech Synthesis Method with Zero-Shot Voice Cloning Capability

Google's New Voice Cloning Technology: Voice Cloning with Just a Few Seconds of Audio Sample

ByteDance Volcano Engine Launches Doubao Music Model and Simultaneous Interpretation Model

Fish Speech 1.4 Released: Open Source TTS Model Achieves Multilingual Breakthrough

The Future Is Here! Alibaba's New Voice Technology CosyVoice Makes AI Speak More Naturally

ByteDance Releases Doubao Image Generation Model: Doubao Large Model's Daily Token Usage Exceeds 50 Billion

AI Voice Company ElevenLabs Completes $80 Million Series B Funding

Bilibili UP主 'XiaoChongGe_' Clones Genshin Impact Character Voices Using VITS Speech Synthesis Model

NetEase Youdao Launches Open Source Speech Synthesis Engine 'Yimosheng' Supporting Over 2000 Voice Tones